Regression testing for wrapper maintenance

نویسنده

  • Nicholas Kushmerick
چکیده

Recent work on Internet information integration ~sumes a library of wrappers, specialized information extraction procedures. Maintaining wrappers is difficult, because the formatting regularities on which they rely often change. The wrapper verification problem is to determine whether a wrapper is correct. Standard regression testing approaches are inappropriate, because both the formatting regularities and a site’s underlying content may change. We introduce RAPTURE, a fully-implemented, domain-independent verification algorithm. RAPTURE uses well-motivated heuristics to compute the similarity between a wrapper’s expected and observed output. Experiments with 27 actual Internet sites show a substantial performance improvement over standard regression testing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Wrapper Generation and Maintenance

This paper investigates automatic wrapper generation and maintenance for Forums, Blogs and News web sites. Web pages are increasingly dynamically generated using a common template populated with data from databases. This paper proposes a novel method that uses tree alignment and transfer learning method to generate the wrapper from this kind of web pages. The tree alignment algorithm is adopted...

متن کامل

Wrapper Veriication

Many Internet information-management applications (e.g., information integration systems) require a library of wrappers, specialized information extraction procedures that translate a source's native format into a structured representation suitable for further application-speci c processing. Maintaining wrappers is tedious and error-prone, because the formatting regularities on which wrappers r...

متن کامل

Wrapper Maintenance: A Machine Learning Approach

The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wr...

متن کامل

An Approach to Cost Effective Regression Testing in Black-Box Testing Environment

Regression testing is an expensive and frequently executed maintenance activity used to revalidate the modified software. As the regression testing is a frequently executed activity in the software maintenance phase, it occupies a large portion of the software maintenance budget. Any reduction in the cost of regression testing would help to reduce the software maintenance cost. The current rese...

متن کامل

Regression Testing Cost Reduction Suite

The estimated cost of software maintenance exceeds 70 percent of total software costs [1], and large portion of this maintenance expenses is devoted to regression testing. Regression testing is an expensive and frequently executed maintenance activity used to revalidate the modified software. Any reduction in the cost of regression testing would help to reduce the software maintenance cost. Tes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999